Goto

Collaborating Authors

 ln 4



Active Learning with Oracle Epiphany

Neural Information Processing Systems

We present a theoretical analysis of active learning with more realistic interactions with human oracles. Previous empirical studies have shown oracles abstaining on difficult queries until accumulating enough information to make label decisions. We formalize this phenomenon with an "oracle epiphany model" and analyze active learning query complexity under such oracles for both the realizable and the agnostic cases. Our analysis shows that active learning is possible with oracle epiphany, but incurs an additional cost depending on when the epiphany happens. Our results suggest new, principled active learning approaches with realistic oracles.





Are Greedy Task Orderings Better Than Random in Continual Linear Regression?

arXiv.org Artificial Intelligence

We analyze task orderings in continual learning for linear regression, assuming joint realizability of training data. We focus on orderings that greedily maximize dissimilarity between consecutive tasks, a concept briefly explored in prior work but still surrounded by open questions. Using tools from the Kaczmarz method literature, we formalize such orderings and develop geometric and algebraic intuitions around them. Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for random orderings, with or without replacement, the average loss after $k$ iterations is bounded by $\mathcal{O}(1/\sqrt{k})$, we prove that single-pass greedy orderings may fail catastrophically, whereas those allowing repetition converge at rate $\mathcal{O}(1/\sqrt[3]{k})$. Overall, we reveal nuances within and between greedy and random orderings.


Memorizing Long-tail Data Can Help Generalization Through Composition

arXiv.org Machine Learning

The relationship between memorization and generalization has always been an intriguing topic in deep learning. It has long been known that neural networks used for supervised learning can memorize noisy or even random labels (Zhang et al., 2017), and recent large language models can still memorize long text despite not being overparametrized (Carlini et al., 2019, 2022). Conventional wisdom from statistical learning theory suggests that memorization might be detrimental to generalization performance, yet neural networks often generalize well with memorization, sometimes even better compared to networks that do not memorize. Trying to understand this relationship from a theoretical perspective has led to the idea of implicit regularization and benign overfitting (Belkin et al., 2019; Bartlett et al., 2020; Belkin et al., 2020; Hastie et al., 2022; Gunasekar et al., 2017), where the training process and architecture choices of neural networks prefer certain solutions that have good generalization despite memorizing the training data. Another interesting line of work (Feldman, 2020; Feldman and Zhang, 2020) demonstrated that memorization can actively help generalization by capturing long-tail behaviors in the training data.


PAC Learnability in the Presence of Performativity

arXiv.org Machine Learning

Following the wide-spread adoption of machine learning models in real-world applications, the phenomenon of performativity, i.e. model-dependent shifts in the test distribution, becomes increasingly prevalent. Unfortunately, since models are usually trained solely based on samples from the original (unshifted) distribution, this performative shift may lead to decreased test-time performance. In this paper, we study the question of whether and when performative binary classification problems are learnable, via the lens of the classic PAC (Probably Approximately Correct) learning framework. We motivate several performative scenarios, accounting in particular for linear shifts in the label distribution, as well as for more general changes in both the labels and the features. We construct a performative empirical risk function, which depends only on data from the original distribution and on the type performative effect, and is yet an unbiased estimate of the true risk of a classifier on the shifted distribution. Minimizing this notion of performative risk allows us to show that any PAC-learnable hypothesis space in the standard binary classification setting remains PAC-learnable for the considered performative scenarios. We also conduct an extensive experimental evaluation of our performative risk minimization method and showcase benefits on synthetic and real data.


Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization

arXiv.org Machine Learning

This paper addresses the Bayesian optimization problem (also referred to as the Bayesian setting of the Gaussian process bandit), where the learner seeks to minimize the regret under a function drawn from a known Gaussian process (GP). Under a Matérn kernel with a certain degree of smoothness, we show that the Gaussian process upper confidence bound (GP-UCB) algorithm achieves $\tilde{O}(\sqrt{T})$ cumulative regret with high probability. Furthermore, our analysis yields $O(\sqrt{T \ln^2 T})$ regret under a squared exponential kernel. These results fill the gap between the existing regret upper bound for GP-UCB and the best-known bound provided by Scarlett (2018). The key idea in our proof is to capture the concentration behavior of the input sequence realized by GP-UCB, enabling a more refined analysis of the GP's information gain.